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Abstract. The application of high end computing to astrophysical problems, mainly 
in the galactic environment, is developing for many years at the Dep. of Physics of 
Sapienza Univ. of Roma. The main scientific topic is the physics of self gravitating 
systems, whose specific subtopics are: i) celestial mechanics and interplanetary probe 
transfers in the solar system; ii) dynamics of globular clusters and of globular cluster 
systems in their parent galaxies; iii) nuclear clusters formation and evolution; iv) mas- 
sive black hole formation and evolution; v) young star cluster early evolution. In this 
poster we describe the software and hardware computational resources available in our 
group and how we are developing both software and hardware to reach the scientific 
aims above itemized. 



1. Introduction 

Celestial meehanies is one of the most classic examples of chaos in physics: the mutual 
gravitating systems show a chaotic behaviour, being extremely sensitive to differences 
in initial conditions. This problem can be only partially controlled using high-order 
integration algorithms. The intrinsic difficulty of the problem is summarized by the so 
called double divergence of the pair interaction potential Utj oc 1 /r,y, where r,y is the 
distance between particle / and particle j. 

The "ultraviolet" divergence corresponds to gravitational encounters at vanishing 
distance, while the "infrared" divergence means that the force never vanishes. The 
computational problems arising from these divergences make the classic gravitational 
A/^-body problem unique. 



2. The NBSymple code performance 

Here we present benchmark tests of our sympletic N-hoAy code, NBSymple (Capuzzo- 
Dolcetta et al., 2011, New Astr., 16, 284) running on hybrid CPU-i-Graphic Processing 
Units (CPUs) architectures. Specifically we ran some benchmark on JAZZ, a hybrid 
CPU-i-GPU cluster managed by CASPUR (see|http://www.caspur.it/en/ ). 

In Fig. 1 (left panel) we report the relative speed up, S„ - Tp{l)/Tp{n), where 
Tp{n) is the time spent using n GPUs. The approximately linear speed up of our code 
is evident (the slope of the best fit is 0.97). 

The dependence of the actual speed of our code on N is shown in the right panel 
of Fig. 1 . The code performance (in GFlops) scale as 
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Figure 1. Left panel: the speedup in function of the number of GPUs used. 
The number of particles is A'^= 1,966,080. Right panel: NBSymple performance in 
double-single precision, for different values of A^, using 1 1 nodes. 
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where we counted 182-i-29A'^ operations per single thread and tk is the time (in sec) 
to accomplish a single computational kernel. The sustained performance is more than 
llTFlops. This is a very high value although reached using a hybrid cluster which is 
small if compared to the top 10 supercomputers in the world. It is worth to underlining 
that the larger is the number of stars (until a certain threshold) the better is the computa- 
tional powe of this kind of architecture. Actually, for small A'^ the performance of GPUs 
are not fully exploited because the time spent in memory transfers becomes compara- 
ble to that spent in calculating interparticle accelerations. The best GPU performance 
is achieved when N is large enough that all the GPU's CUDA cores are fully loaded. 



3. Conclusions 

Hybrid ( CPU -i- GPU) cluster architectures are probably the best choices as a means 
of investigating the gravitational A^-body problem. Our next stage of testing will thus 
comparing the GPUs from different manufacturers showing different features. Actually, 
on the market are now found many GPUs apt to perform computations at, nominal, high 
speed and efficiency, but whose actual suitability to large scale physics computations 
still remains to be checked. 



